{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/embeddings/huggingface.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Local Embeddings with HuggingFace\n",
    "\n",
    "LlamaIndex has support for HuggingFace embedding models, including Sentence Transformer models like BGE, Mixedbread, Nomic, Jina, E5, etc. We can use these models to create embeddings for our documents and queries for retrieval.\n",
    "\n",
    "Furthermore, we provide utilities to create and use ONNX and OpenVINO models using the [Optimum library](https://huggingface.co/docs/optimum) from HuggingFace."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## HuggingFaceEmbedding\n",
    "\n",
    "The base `HuggingFaceEmbedding` class is a generic wrapper around any HuggingFace model for embeddings. All [embedding models](https://huggingface.co/models?library=sentence-transformers) on Hugging Face should work. You can refer to the [embeddings leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for more recommendations.\n",
    "\n",
    "This class depends on the sentence-transformers package, which you can install with `pip install sentence-transformers`.\n",
    "\n",
    "NOTE: if you were previously using a `HuggingFaceEmbeddings` from LangChain, this should give equivalent results."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-embeddings-huggingface"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n",
    "\n",
    "# loads https://huggingface.co/BAAI/bge-small-en-v1.5\n",
    "embed_model = HuggingFaceEmbedding(model_name=\"BAAI/bge-small-en-v1.5\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "384\n",
      "[-0.003275700844824314, -0.011690810322761536, 0.041559211909770966, -0.03814814239740372, 0.024183044210076332]\n"
     ]
    }
   ],
   "source": [
    "embeddings = embed_model.get_text_embedding(\"Hello World!\")\n",
    "print(len(embeddings))\n",
    "print(embeddings[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Benchmarking\n",
    "\n",
    "Let's try comparing using a classic large document -- the IPCC climate report, chapter 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "\n",
      "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\n",
      "100 20.7M  100 20.7M    0     0  69.6M      0 --:--:-- --:--:-- --:--:-- 70.0M\n"
     ]
    }
   ],
   "source": [
    "!curl https://www.ipcc.ch/report/ar6/wg2/downloads/report/IPCC_AR6_WGII_Chapter03.pdf --output IPCC_AR6_WGII_Chapter03.pdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
    "from llama_index.core import Settings\n",
    "\n",
    "documents = SimpleDirectoryReader(\n",
    "    input_files=[\"IPCC_AR6_WGII_Chapter03.pdf\"]\n",
    ").load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Base HuggingFace Embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n",
    "\n",
    "# loads BAAI/bge-small-en-v1.5 with the default torch backend\n",
    "embed_model = HuggingFaceEmbedding(\n",
    "    model_name=\"BAAI/bge-small-en-v1.5\",\n",
    "    device=\"cpu\",\n",
    "    embed_batch_size=8,\n",
    ")\n",
    "test_embeds = embed_model.get_text_embedding(\"Hello World!\")\n",
    "\n",
    "Settings.embed_model = embed_model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Parsing nodes: 100%|██████████| 172/172 [00:00<00:00, 428.44it/s]\n",
      "Generating embeddings: 100%|██████████| 459/459 [00:19<00:00, 23.32it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "20.2 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -r 1 -n 1\n",
    "index = VectorStoreIndex.from_documents(documents, show_progress=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ONNX Embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# pip install sentence-transformers[onnx]\n",
    "\n",
    "# loads BAAI/bge-small-en-v1.5 with the onnx backend\n",
    "embed_model = HuggingFaceEmbedding(\n",
    "    model_name=\"BAAI/bge-small-en-v1.5\",\n",
    "    device=\"cpu\",\n",
    "    backend=\"onnx\",\n",
    "    model_kwargs={\n",
    "        \"provider\": \"CPUExecutionProvider\"\n",
    "    },  # For ONNX, you can specify the provider, see https://sbert.net/docs/sentence_transformer/usage/efficiency.html\n",
    ")\n",
    "test_embeds = embed_model.get_text_embedding(\"Hello World!\")\n",
    "\n",
    "Settings.embed_model = embed_model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Parsing nodes: 100%|██████████| 172/172 [00:00<00:00, 421.63it/s]\n",
      "Generating embeddings: 100%|██████████| 459/459 [00:31<00:00, 14.53it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "32.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "%%timeit -r 1 -n 1\n",
    "index = VectorStoreIndex.from_documents(documents, show_progress=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### OpenVINO Embeddings\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# pip install sentence-transformers[openvino]\n",
    "\n",
    "# loads BAAI/bge-small-en-v1.5 with the openvino backend\n",
    "embed_model = HuggingFaceEmbedding(\n",
    "    model_name=\"BAAI/bge-small-en-v1.5\",\n",
    "    device=\"cpu\",\n",
    "    backend=\"openvino\",  # OpenVINO is very strong on CPUs\n",
    "    revision=\"refs/pr/16\",  # BAAI/bge-small-en-v1.5 itself doesn't have an OpenVINO model currently, but there's a PR with it that we can load: https://huggingface.co/BAAI/bge-small-en-v1.5/discussions/16\n",
    "    model_kwargs={\n",
    "        \"file_name\": \"openvino_model_qint8_quantized.xml\"\n",
    "    },  # If we're using an optimized/quantized model, we need to specify the file name like this\n",
    ")\n",
    "test_embeds = embed_model.get_text_embedding(\"Hello World!\")\n",
    "\n",
    "Settings.embed_model = embed_model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Parsing nodes: 100%|██████████| 172/172 [00:00<00:00, 403.15it/s]\n",
      "Generating embeddings: 100%|██████████| 459/459 [00:08<00:00, 53.83it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "9.03 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -r 1 -n 1\n",
    "index = VectorStoreIndex.from_documents(documents, show_progress=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### References\n",
    "\n",
    "* [Local Embedding Models](https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings/#local-embedding-models) explains more about using local models like these.\n",
    "* [Sentence Transformers > Speeding up Inference](https://sbert.net/docs/sentence_transformer/usage/efficiency.html) contains extensive documentation on how to use the backend options effectively, including optimization and quantization for ONNX and OpenVINO."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_index",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
